Methods in Clustering

نویسندگان

  • Katelyn Gao
  • Heather Hardeman
  • Edward Lim
  • Cristian Potter
  • Carl Meyer
  • Ralph Abbey
چکیده

Cluster Analytics helps to analyze the massive amounts of data which have accrued in this technological age. It employs the idea of clustering, or grouping, objects with similar traits within the data. The benefit of clustering is that the methods do not require any prior knowledge of the data. Hence, through cluster analysis, interpreting large data sets becomes, in most cases, much easier. However one of the major challenges in cluster analytics is determining the exact number of clusters, k, within the data. For methods such as k-means and nonnegative matrix factorization, choosing the appropriate k is important. Other methods such as Reverse Simon-Ando are not as dependent on beginning with the correct k. In this paper, we discuss these methods and apply them to several well-known data sets. We then explore techniques of deriving the number of clusters from the data set and lastly several points of theoretical interest.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

A Hybrid Time Series Clustering Method Based on Fuzzy C-Means Algorithm: An Agreement Based Clustering Approach

In recent years, the advancement of information gathering technologies such as GPS and GSM networks have led to huge complex datasets such as time series and trajectories. As a result it is essential to use appropriate methods to analyze the produced large raw datasets. Extracting useful information from large data sets has always been one of the most important challenges in different sciences,...

متن کامل

Assessment of Clustering Methods for Predicting Permeability in a Heterogeneous Carbonate Reservoir

Permeability, the ability of rocks to flow hydrocarbons, is directly determined from core. Due to high cost associated with coring, many techniques have been suggested to predict permeability from the easy-to-obtain and frequent properties of reservoirs such as log derived porosity. This study was carried out to put clustering methods (dynamic clustering (DC), ascending hierarchical clustering ...

متن کامل

Detection of lung cancer using CT images based on novel PSO clustering

Lung cancer is one of the most dangerous diseases that cause a large number of deaths. Early detection and analysis can be very helpful for successful treatment. Image segmentation plays a key role in the early detection and diagnosis of lung cancer. K-means algorithm and classic PSO clustering are the most common methods for segmentation that have poor outputs. In t...

متن کامل

Extraction of Respiratory Signal Based on Image Clustering and Intensity Parameters at Radiotherapy with External Beam: A Comparative Study

Background: Since tumors located in thorax region of body mainly move due to respiration, in the modern radiotherapy, there have been many attempts such as; external markers, strain gage and spirometer represent for monitoring patients’ breathing signal. With the advent of fluoroscopy technique, indirect methods were proposed as an alternative approach to extract patients’ breathing signals...

متن کامل

Using Clustering and Factor Analysis in Cross Section Analysis Based on Economic-Environment Factors

Homogeneity of groups in studies those use cross section and multi-level data is important. Most studies in economics especially panel data analysis need some kinds of homogeneity to ensure validity of results. This paper represents the methods known as clustering and homogenization of groups in cross section studies based on enviro-economics components. For this, a sample of 92 countries which...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011